JJE: Applying Graph Flow Theory to Wikipedia

نویسندگان

  • Jonathan McElroy
  • Bryan Clevenger
چکیده

Data in the world, and more specifically on the Internet is growing to massive sizes. In order to make this information more useful, it must first be more accessible. The INEX Initiative competition is aimed with the goal of identifying and comparing methodologies for categorizing information into clusters. The competition will be run on 60 gigabytes of data from Wikipedia, with the ultimate goal of accurately categorizing and clustering in order to reduce search time through this data. In this work, we construct a clustering algorithm based of the link structure of a subset of underlying pages. The resulting webgraph is pruned using a max flow min cut algorithm[10, 2, 8] which is initially seeded using different heuristics. We compare search space reduction results and construct a visualization of the clustered documents. We were able to generate clusters on the INEX data set as well as visualization of clustered data on several different datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analysis of Resting-State fMRI Topological Graph Theory Properties in Methamphetamine Drug Users Applying Box-Counting Fractal Dimension

Introduction: Graph theoretical analysis of functional Magnetic Resonance Imaging (fMRI) data has provided new measures of mapping human brain in vivo. Of all methods to measure the functional connectivity between regions, Linear Correlation (LC) calculation of activity time series of the brain regions as a linear measure is considered the most ubiquitous one. The strength of the dependence obl...

متن کامل

Wikipedia graph mining: dynamic structure of collective memory

ABSTRACT Wikipedia is the biggest ever created encyclopedia and the fifth most visited website in the world. Tens of millions of people surf it every day, seeking answers to various questions. Collective user activity on the pages leaves publicly available footprints of human behavior, making Wikipedia a great source of the data for largescale analysis of collective dynamical patterns. The dyna...

متن کامل

ارزیابی پیوستگی اکولوژیک لکه‌های سبز شهری با استفاده از تئوری گراف،مطالعه موردی کلان‌شهر اهواز

Connectivity of urban green patches is an important structural attribute of urban landscape that facilitates the species movement and transfer of their genes among their habitats. So far, several methods including Graph Theory have been applied to assess ecological connectivity. This research was aimed  to study the application of graph theory to measure the connectivity of green patches in the...

متن کامل

Extracting Semantic Information from Wikipedia Using Human Computation and Dimensionality Reduction

Semantic background knowledge is crucial for many intelligent applications. A classical way to represent such knowledge is through semantic networks. Wikipedia’s hyperlink graph can be considered a primitive semantic network, since the links it contains usually correspond to semantic relationships between the articles they connect. However, Wikipedia is rather noisy in this function. We propose...

متن کامل

Notes on NP Completeness

Here are some notes which I wrote to try to understand what NP completeness means. Most of these notes are taken from Appendix B in Douglas West’s graph theory book, and also from wikipedia. There’s nothing remotely original about these notes. I just wanted all this material to filter through my brain and onto paper. I also wanted to collect everything together in a way I like. Here’s what thes...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009